A Machine Reading System for Assembling Synthetic Paleontological Databases
Identifieur interne : 000115 ( Main/Exploration ); précédent : 000114; suivant : 000116A Machine Reading System for Assembling Synthetic Paleontological Databases
Auteurs : Shanan E. Peters [États-Unis] ; Ce Zhang [États-Unis] ; Miron Livny [États-Unis] ; Christopher Ré [États-Unis]Source :
- PLoS ONE [ 1932-6203 ] ; 2014.
Abstract
Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of paleontological data. Existing manually assembled databases are, however, incomplete and difficult to assess and enhance with new data types. Here, we develop and validate the quality of a machine reading system, PaleoDeepDive, that automatically locates and extracts data from heterogeneous text, tables, and figures in publications. PaleoDeepDive performs comparably to humans in several complex data extraction and inference tasks and generates congruent synthetic results that describe the geological history of taxonomic diversity and genus-level rates of origination and extinction. Unlike traditional databases, PaleoDeepDive produces a probabilistic database that systematically improves as information is added. We show that the system can readily accommodate sophisticated data types, such as morphological data in biological illustrations and associated textual descriptions. Our machine reading approach to scientific data integration and synthesis brings within reach many questions that are currently underdetermined and does so in ways that may stimulate entirely new modes of inquiry.
Url:
DOI: 10.1371/journal.pone.0113523
PubMed: 25436610
PubMed Central: 4250071
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: 000016
- to stream Pmc, to step Curation: 000016
- to stream Pmc, to step Checkpoint: 000062
- to stream Ncbi, to step Merge: 000218
- to stream Ncbi, to step Curation: 000218
- to stream Ncbi, to step Checkpoint: 000218
- to stream Main, to step Merge: 000116
- to stream Main, to step Curation: 000115
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">A Machine Reading System for Assembling Synthetic Paleontological Databases</title>
<author><name sortKey="Peters, Shanan E" sort="Peters, Shanan E" uniqKey="Peters S" first="Shanan E." last="Peters">Shanan E. Peters</name>
<affiliation wicri:level="2"><nlm:aff id="aff1"><addr-line>Department of Geoscience, University of Wisconsin-Madison, Madison, Wisconsin, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Geoscience, University of Wisconsin-Madison, Madison, Wisconsin</wicri:regionArea>
<placeName><region type="state">Wisconsin</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Zhang, Ce" sort="Zhang, Ce" uniqKey="Zhang C" first="Ce" last="Zhang">Ce Zhang</name>
<affiliation wicri:level="2"><nlm:aff id="aff2"><addr-line>Computer Sciences Department, University of Wisconsin-Madison, Madison, Wisconsin, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer Sciences Department, University of Wisconsin-Madison, Madison, Wisconsin</wicri:regionArea>
<placeName><region type="state">Wisconsin</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Livny, Miron" sort="Livny, Miron" uniqKey="Livny M" first="Miron" last="Livny">Miron Livny</name>
<affiliation wicri:level="2"><nlm:aff id="aff2"><addr-line>Computer Sciences Department, University of Wisconsin-Madison, Madison, Wisconsin, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer Sciences Department, University of Wisconsin-Madison, Madison, Wisconsin</wicri:regionArea>
<placeName><region type="state">Wisconsin</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Re, Christopher" sort="Re, Christopher" uniqKey="Re C" first="Christopher" last="Ré">Christopher Ré</name>
<affiliation wicri:level="2"><nlm:aff id="aff3"><addr-line>Department of Computer Science, Stanford University, Stanford, California, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stanford University, Stanford, California</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">25436610</idno>
<idno type="pmc">4250071</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4250071</idno>
<idno type="RBID">PMC:4250071</idno>
<idno type="doi">10.1371/journal.pone.0113523</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">000016</idno>
<idno type="wicri:Area/Pmc/Curation">000016</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000062</idno>
<idno type="wicri:Area/Ncbi/Merge">000218</idno>
<idno type="wicri:Area/Ncbi/Curation">000218</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000218</idno>
<idno type="wicri:Area/Main/Merge">000116</idno>
<idno type="wicri:Area/Main/Curation">000115</idno>
<idno type="wicri:Area/Main/Exploration">000115</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">A Machine Reading System for Assembling Synthetic Paleontological Databases</title>
<author><name sortKey="Peters, Shanan E" sort="Peters, Shanan E" uniqKey="Peters S" first="Shanan E." last="Peters">Shanan E. Peters</name>
<affiliation wicri:level="2"><nlm:aff id="aff1"><addr-line>Department of Geoscience, University of Wisconsin-Madison, Madison, Wisconsin, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Geoscience, University of Wisconsin-Madison, Madison, Wisconsin</wicri:regionArea>
<placeName><region type="state">Wisconsin</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Zhang, Ce" sort="Zhang, Ce" uniqKey="Zhang C" first="Ce" last="Zhang">Ce Zhang</name>
<affiliation wicri:level="2"><nlm:aff id="aff2"><addr-line>Computer Sciences Department, University of Wisconsin-Madison, Madison, Wisconsin, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer Sciences Department, University of Wisconsin-Madison, Madison, Wisconsin</wicri:regionArea>
<placeName><region type="state">Wisconsin</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Livny, Miron" sort="Livny, Miron" uniqKey="Livny M" first="Miron" last="Livny">Miron Livny</name>
<affiliation wicri:level="2"><nlm:aff id="aff2"><addr-line>Computer Sciences Department, University of Wisconsin-Madison, Madison, Wisconsin, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer Sciences Department, University of Wisconsin-Madison, Madison, Wisconsin</wicri:regionArea>
<placeName><region type="state">Wisconsin</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Re, Christopher" sort="Re, Christopher" uniqKey="Re C" first="Christopher" last="Ré">Christopher Ré</name>
<affiliation wicri:level="2"><nlm:aff id="aff3"><addr-line>Department of Computer Science, Stanford University, Stanford, California, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stanford University, Stanford, California</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint><date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p>Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of paleontological data. Existing manually assembled databases are, however, incomplete and difficult to assess and enhance with new data types. Here, we develop and validate the quality of a machine reading system, PaleoDeepDive, that automatically locates and extracts data from heterogeneous text, tables, and figures in publications. PaleoDeepDive performs comparably to humans in several complex data extraction and inference tasks and generates congruent synthetic results that describe the geological history of taxonomic diversity and genus-level rates of origination and extinction. Unlike traditional databases, PaleoDeepDive produces a probabilistic database that systematically improves as information is added. We show that the system can readily accommodate sophisticated data types, such as morphological data in biological illustrations and associated textual descriptions. Our machine reading approach to scientific data integration and synthesis brings within reach many questions that are currently underdetermined and does so in ways that may stimulate entirely new modes of inquiry.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Raup, Dm" uniqKey="Raup D">DM Raup</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bambach, Rk" uniqKey="Bambach R">RK Bambach</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sepkoski, Jj" uniqKey="Sepkoski J">JJ Sepkoski</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sepkoski, Jj" uniqKey="Sepkoski J">JJ Sepkoski</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Benton, Mj" uniqKey="Benton M">MJ Benton</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Alroy, J" uniqKey="Alroy J">J Alroy</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Jablonski, D" uniqKey="Jablonski D">D Jablonski</name>
</author>
<author><name sortKey="Roy, K" uniqKey="Roy K">K Roy</name>
</author>
<author><name sortKey="Valentine, Jw" uniqKey="Valentine J">JW Valentine</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kiessling, W" uniqKey="Kiessling W">W Kiessling</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Alroy, J" uniqKey="Alroy J">J Alroy</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Finnegan, S" uniqKey="Finnegan S">S Finnegan</name>
</author>
<author><name sortKey="Heim, Na" uniqKey="Heim N">NA Heim</name>
</author>
<author><name sortKey="Peters, Se" uniqKey="Peters S">SE Peters</name>
</author>
<author><name sortKey="Fischer, Ww" uniqKey="Fischer W">WW Fischer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Blois, Jl" uniqKey="Blois J">JL Blois</name>
</author>
<author><name sortKey="Zarnetske, Pl" uniqKey="Zarnetske P">PL Zarnetske</name>
</author>
<author><name sortKey="Fitzpatrick, Mc" uniqKey="Fitzpatrick M">MC Fitzpatrick</name>
</author>
<author><name sortKey="Finnegan, S" uniqKey="Finnegan S">S Finnegan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Alroy, J" uniqKey="Alroy J">J Alroy</name>
</author>
<author><name sortKey="Aberhan, M" uniqKey="Aberhan M">M Aberhan</name>
</author>
<author><name sortKey="Bottjer, Dj" uniqKey="Bottjer D">DJ Bottjer</name>
</author>
<author><name sortKey="Foote, M" uniqKey="Foote M">M Foote</name>
</author>
<author><name sortKey="Fursich, Ft" uniqKey="Fursich F">FT Fürsich</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Alroy, J" uniqKey="Alroy J">J Alroy</name>
</author>
<author><name sortKey="Marshall, Cr" uniqKey="Marshall C">CR Marshall</name>
</author>
<author><name sortKey="Bambach, Rk" uniqKey="Bambach R">RK Bambach</name>
</author>
<author><name sortKey="Bezusko, K" uniqKey="Bezusko K">K Bezusko</name>
</author>
<author><name sortKey="Foote, M" uniqKey="Foote M">M Foote</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ferrucci, Da" uniqKey="Ferrucci D">DA Ferrucci</name>
</author>
<author><name sortKey="Brown, E" uniqKey="Brown E">E Brown</name>
</author>
<author><name sortKey="Chu Carroll, J" uniqKey="Chu Carroll J">J Chu-Carroll</name>
</author>
<author><name sortKey="Fan, J" uniqKey="Fan J">J Fan</name>
</author>
<author><name sortKey="Gondek, D" uniqKey="Gondek D">D Gondek</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Kumar, A" uniqKey="Kumar A">A Kumar</name>
</author>
<author><name sortKey="Niu, F" uniqKey="Niu F">F Niu</name>
</author>
<author><name sortKey="Re, C" uniqKey="Re C">C Ré</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Krishnamurthy, R" uniqKey="Krishnamurthy R">R Krishnamurthy</name>
</author>
<author><name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author><name sortKey="Raghavan, S" uniqKey="Raghavan S">S Raghavan</name>
</author>
<author><name sortKey="Reiss, F" uniqKey="Reiss F">F Reiss</name>
</author>
<author><name sortKey="Vaithyanathan, S" uniqKey="Vaithyanathan S">S Vaithyanathan</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Govindaraju, V" uniqKey="Govindaraju V">V Govindaraju</name>
</author>
<author><name sortKey="Zhang, C" uniqKey="Zhang C">C Zhang</name>
</author>
<author><name sortKey="Re, C" uniqKey="Re C">C Ré</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Niu, F" uniqKey="Niu F">F Niu</name>
</author>
<author><name sortKey="Recht, B" uniqKey="Recht B">B Recht</name>
</author>
<author><name sortKey="Re, C" uniqKey="Re C">C Ré</name>
</author>
<author><name sortKey="Wright, Sj" uniqKey="Wright S">SJ Wright</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Zhang, C" uniqKey="Zhang C">C Zhang</name>
</author>
<author><name sortKey="Re, C" uniqKey="Re C">C Ré</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Recht, B" uniqKey="Recht B">B Recht</name>
</author>
<author><name sortKey="Re, C" uniqKey="Re C">C Ré</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Niu, F" uniqKey="Niu F">F Niu</name>
</author>
<author><name sortKey="Re, C" uniqKey="Re C">C Ré</name>
</author>
<author><name sortKey="Doan, A" uniqKey="Doan A">A Doan</name>
</author>
<author><name sortKey="Shavlik, J" uniqKey="Shavlik J">J Shavlik</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Wainwright, Mj" uniqKey="Wainwright M">MJ Wainwright</name>
</author>
<author><name sortKey="Jordan, Mi" uniqKey="Jordan M">MI Jordan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Callison Burch, C" uniqKey="Callison Burch C">C Callison-Burch</name>
</author>
<author><name sortKey="Dredze, M" uniqKey="Dredze M">M Dredze</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mintz, M" uniqKey="Mintz M">M Mintz</name>
</author>
<author><name sortKey="Bills, S" uniqKey="Bills S">S Bills</name>
</author>
<author><name sortKey="Snow, R" uniqKey="Snow R">R Snow</name>
</author>
<author><name sortKey="Jurafsky, D" uniqKey="Jurafsky D">D Jurafsky</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hoffmann, R" uniqKey="Hoffmann R">R Hoffmann</name>
</author>
<author><name sortKey="Zhang, C" uniqKey="Zhang C">C Zhang</name>
</author>
<author><name sortKey="Weld, Ds" uniqKey="Weld D">DS Weld</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kopcke, H" uniqKey="Kopcke H">H Köpcke</name>
</author>
<author><name sortKey="Thor, A" uniqKey="Thor A">A Thor</name>
</author>
<author><name sortKey="Rahm, E" uniqKey="Rahm E">E Rahm</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Foote, M" uniqKey="Foote M">M Foote</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Miller, Ai" uniqKey="Miller A">AI Miller</name>
</author>
<author><name sortKey="Foote, M" uniqKey="Foote M">M Foote</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Alroy, J" uniqKey="Alroy J">J Alroy</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sepkoski, Jj" uniqKey="Sepkoski J">JJ Sepkoski</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Adrain, Jm" uniqKey="Adrain J">JM Adrain</name>
</author>
<author><name sortKey="Westrop, Sr" uniqKey="Westrop S">SR Westrop</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ausich, Wi" uniqKey="Ausich W">WI Ausich</name>
</author>
<author><name sortKey="Peters, Se" uniqKey="Peters S">SE Peters</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Niu, F" uniqKey="Niu F">F Niu</name>
</author>
<author><name sortKey="Zhang, C" uniqKey="Zhang C">C Zhang</name>
</author>
<author><name sortKey="Re, C" uniqKey="Re C">C Ré</name>
</author>
<author><name sortKey="Shavlik, Jw" uniqKey="Shavlik J">JW Shavlik</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Van Noorden, R" uniqKey="Van Noorden R">R Van Noorden</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Smith, Ab" uniqKey="Smith A">AB Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Peters, Se" uniqKey="Peters S">SE Peters</name>
</author>
<author><name sortKey="Foote, M" uniqKey="Foote M">M Foote</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Patterson, C" uniqKey="Patterson C">C Patterson</name>
</author>
<author><name sortKey="Smith, Ab" uniqKey="Smith A">AB Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dubois, A" uniqKey="Dubois A">A Dubois</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Laurin, M" uniqKey="Laurin M">M Laurin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Payne, Jl" uniqKey="Payne J">JL Payne</name>
</author>
<author><name sortKey="Boyer, Ag" uniqKey="Boyer A">AG Boyer</name>
</author>
<author><name sortKey="Brown, Jh" uniqKey="Brown J">JH Brown</name>
</author>
<author><name sortKey="Finnegan, S" uniqKey="Finnegan S">S Finnegan</name>
</author>
<author><name sortKey="Kowalewski, M" uniqKey="Kowalewski M">M Kowalewski</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Laurin, M" uniqKey="Laurin M">M Laurin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Finarelli, Ja" uniqKey="Finarelli J">JA Finarelli</name>
</author>
<author><name sortKey="Flynn, Jj" uniqKey="Flynn J">JJ Flynn</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Slater, Gj" uniqKey="Slater G">GJ Slater</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Californie</li>
<li>Wisconsin</li>
</region>
</list>
<tree><country name="États-Unis"><region name="Wisconsin"><name sortKey="Peters, Shanan E" sort="Peters, Shanan E" uniqKey="Peters S" first="Shanan E." last="Peters">Shanan E. Peters</name>
</region>
<name sortKey="Livny, Miron" sort="Livny, Miron" uniqKey="Livny M" first="Miron" last="Livny">Miron Livny</name>
<name sortKey="Re, Christopher" sort="Re, Christopher" uniqKey="Re C" first="Christopher" last="Ré">Christopher Ré</name>
<name sortKey="Zhang, Ce" sort="Zhang, Ce" uniqKey="Zhang C" first="Ce" last="Zhang">Ce Zhang</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000115 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000115 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= PMC:4250071 |texte= A Machine Reading System for Assembling Synthetic Paleontological Databases }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:25436610" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a OcrV1
This area was generated with Dilib version V0.6.32. |